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ABSTRACT 


Recently, fake news has been incurring many problems to our society. As a 
result, many researchers have been working on identifying fake news. Most of 
the fake news detection systems utilize the linguistic feature of the news. 
However, they have difficulty in sensing highly ambiguous fake news which 
can be detected only after identifying meaning and latest related information. 
In this paper, to resolve this problem, we shall present a new Korean fake 
news detection system using fact DB which is built and updated by human's 
direct judgement after collecting obvious facts. Our system receives a 
proposition, and search the semantically related articles from Fact DB in order 
to verify whether the given proposition is true or not by comparing the 
proposition with the related articles in fact DB. To achieve this, we utilize a 
deep learning model, Bidirectional Multi-Perspective Matching for Natural 
Language Sentence(BiMPM], which has demonstrated a good performance for 
the sentence matching task. However, BiMPM has some limitations in that the 
longer the length of the input sentence is, the lower its performance is, and it 
has difficulty in making an accurate judgement when an unlearned word or 
relation between words appear. In order to overcome the limitations, we shall 
propose a new matching technique which exploits article abstraction as well 
as entity matching set in addition to BiMPM. In our experiment, we shall show 
that our system improves the whole performance for fake news detection. 

KEYWORDS: Fake news detection , Sentence matching , Natural Language 
Processing , Deep learning , BiLSTM model , Machine Learning 


How to cite this paper: Prasanth. K | 
Praveen. N | Vijay. S | Auxilia Osvin Nancy. 
V "Fake News Detection using Machine 
Learning" Published 
in International 
Journal of Trend in 
Scientific Research 
and Development 
(ijtsrd), ISSN: 2456- 
6470, Volume-4 | 

Issue-2, February 
2020, pp.512-514, 

www.ijtsrd.com/papers/ijtsrd30014.pdf 

Copyright © 2019 by author(s) and 
International Journal of Trend in Scientific 
Research and Development Journal. This 
is an Open Access article distributed 
under the terms of 



@(D 


the Creative 

Commons Attribution 
License (CC BY 4.0] 

(http://creativecommons.org/licenses/by 
/4.0] 


INRODUCTION 

Fake news has been incurring many problems to our society. 
Fake news is the ones that the writer intends to mislead in 
order to achieve his/her interests politically or economically 
on purpose [1]. With the generation of a huge volume of 
internet news and social media. It becomes much more 
difficult to identify fake news personally. Recently, many 
researchers have worked on fake news detection system 
which automatically determines if any opinion claimed in the 
article contains fake content [2]. In a large context, the forms 
of their research are carried out with the method that 
connects the linguistic pattern of news to deception, and that 
verifies deception by utilizing external knowledge [3]. The 
first approach can quickly verify fake news at a low cost. 
However, in order to detect clever fake news, it is necessary 
to grasp the semantic content of the article rather than 
partial patterns and verify it through external facts updated 
by human. Therefore, we search the in put proposition and 
related articles from the Fact DB, and develop the fake news 
detection system to verify if the found articles and 
proposition are semantically related. 

Related Works: 

Recently, as the deep learning in the NLP field has been 
developed, various types of the sentence matching 
techniques have been introduced. We introduce the related 
research of the sentence matching techniques as we divide 
the works into the unsupervised learning, and supervised 
learning based works. 


A. Unsupervised Learning: 

One of the most important elements in the sentence 
matching is the way of expressing a word into a data 
structure. The existing method of expressing words is one- 
hot encoding vector. However, this method requires lots of 
dimension to express a single word, and cannot express the 
relation between words. Overcoming these shortcomings, 
the word-to-vector [word2vec] [5] method was proposed 
which maps significant information into the vector of fixed 
dimensions. The word2vec is enabled to learn the weight to 
increase the probability that the nearby words will appear 
for the main word, and uses the corresponding weight as a 
vector. As an extended research of word2vec, sentence-to- 
vector (sent2vec). 

B. Supervised Learning: 

Recently, the research of the machine comprehension is 
developed with attention mechanism and BiLSTM. LSTM 
resolved the vanishing gradient problem of Recursive Neural 
Network (RNN) by adding the layer that forgets the past 
information, and remembers the current one to the cell. 
Since LSTM handles sequential inputs, it is often used for 
encoding and decoding of sentences. However, as the length 
of LSTM becomes longer, the model loses the information, 
and it shows a tendency to remember the latter information. 
Therefore, scholars worked on improving the performance 
of the existing LSTM through attention mechanism which 
reminds important information selectively. They also 
showed using BiLSTM together can improve performance. 
The Bi-Directional Attention Flow (BiDAF) [7] minimizes the 
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loss of information by applying the attention mechanism at 
each time stamp of LSTM. In particular, BiMPM [8] applied 
BiLSTM and attention mechanism to sentence matching. 

Proposed System: 

This section discusses the proposed solution for fake news 
detection by combining Fake News with Sentiment Analysis. 
Proposed solution is shown in Fig. 1. It consists of various 
steps as below: 



Step 1 : Merged data set was prepared using from different 
data sets namely Politifact, Kaggle and Emergent 
datasets. 

Step 2 : The different text preprocessing techniques like 
bigrams (series of two words taken from a given 
text) ,trigrams (continuous series of three words 
taken from example text), CountVectorizer (count of 
terms in vector/ text , term frequency-inverse 
document frequency (tf-idf) vectorizer. 

Step 3 : We have used tf-idf vectorizer on twitter dataset 
along with cosine similarity to build our vocabulary. 
Then Naive Bayes classifier was used to predict the 
sentiment of news statement of test data set (Merged 
data set) as shown in Fig. 2. 



Step 4 : We added additional columns: tf-idf scores, 
sentiments and Cosine similarity scores in Merged 
data set. 

Step 5 : Training model was built using Naive Bayes and 
Random Forest (train-test ratio: 3:1) 

Step 6 : Performance is evaluated and compared using 
accuracy. 

Proposed solution consists of important steps 2 to 4 as 
preprocessing. It uses tf-idf Vectorizer with cosine 
similarities method for tokenizing a collection of text 
documents along with building a vocabulary of pre-existing 
words. Further we encoded the novel documents using that 
vocabulary. The encoded vector is returned with length of 
the entire vocabulary (bag of words) and an integer count 
for the number of times each word appeared had in the 
document. 

System Evaluation: 

We evaluate the performance of proposed system in this 
section. Given the relevant article on the input proposition, 
the evaluation verifies the ability to determine whether the 
semantic content of the input proposition can be found in the 
relevant article. We train the BiMPM which is the foundation 
of our system, and the experiments identify how much the 
performance improves by adding modules proposed 
previously. We first build the data set directly to train the 
BiMPM to output true or false when given a short article 
consisting of three or four sentences and propositions. In the 
datasets construction, the following policy is set up to 
proceed with the learning. 

1. Extract one sentence from a short article, and use it into 
an input proposition. 

2. Generate the data, which is true through variations of 
thesaurus, a change of word orders, and omission of 
some contents. 

3. Distort some information such as numbers, nouns, and 
verbs or omit words to generate false data. 
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Fig.2. Training accuracy of test set for each epoch. 

Proved the sentences used in the new test set are longer, and 
consist of new words that are not in the previous dataset. In 
terms of the using True Positive Rate (TPR) as the y-axis and 
False Positive Rate (FPR) as the x-axis. 

Conclusion: 

In this paper, we have proposed the fake news detection 
system using Machine learning which is built and updated by 
human's direct judgement. Our system receives a 



@ IJTSRD | Unique Paper ID - IJTSRD30014 | Volume - 4 | Issue - 2 | January-February 2020 


Page 513 




















International Journal of Trend in Scientific Research and Development (IJTSRD] @ www.ijtsrd.com elSSN: 2456-6470 


proposition as an input to verify, and search the related 
articles so that it verifies if the article found by the entered 
proposition can be semantically concurred with the 
proposition. To achieve this, we utilized model which is a 
deep learning model for sentences matching and machine 
learning. However, even though has shown good 
performance in various datasets, it has some limitations such 
that the longer the length of the input sentence is, the lower 
its performance is, and it has difficulty in making an accurate 
judgement when an unlearned word or relation between 
words appear. In order to overcome the limitations, we have 
presented the new matching technique which makes use of 
article abstraction as well as entity matching set besides 
BiMPM. In our experiment, we have shown that our system 
improved the whole performance for fake news detection. 
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